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Abstract — We investigate the recently introduced notion of 
smooth Renyi entropy for the case of ergodic information sources, 
thereby generalizing previous work which concentrated mainly 
on i.i.d. information sources. We will actually consider ergodic 
quantum information sources, of which ergodic classical infor- 
mation sources are a special case. We prove that the average 
smooth Renyi entropy rate will approach the entropy rate of a 
stationary, ergodic source, which is equal to the Shannon entropy 
rate for a classical source and the von Neumann entropy rate 
for a quantum source. 

I. Introduction 

The elegant notion of smooth Renyi entropy was introduced 
recently by Renner and Wolf in [6] for classical information 
sources, and the natural extension to quantum information 
sources was defined by Renner and Konig in [5], In these two 
papers and further work by Renner and Wolf [7], [4], many 
properties of smooth Renyi entropy — and smooth min-entropy 
and smooth max-entropy in particular — have been studied in 
detail. 

A central property of smooth Renyi entropy proved in these 
works is that for memoryless (i.i.d.) information sources, the 
average smooth Renyi entropy rate will approach the entropy 
rate of the source, which is equal to the Shannon entropy for a 
classical source and the von Neumann entropy for a quantum 
source. Whereas, in general, the average (conventional) Renyi 
entropy rate of a memoryless source does not converge to the 
source's entropy rate. 

In this paper we extend the study of smooth Renyi entropy 
to the more general class of stationary, ergodic sources rather 
than memoryless sources. We will prove that for both the 
classical and the quantum case that the average smooth Renyi 
entropy rate will approach the Shannon and the von Neumann 
entropy rate, respectively. We will do so by first treating the 
classical case and then reducing the quantum case to the 
classical one without losing generality. 

In general, smooth Renyi entropy of order a > 1, and a = 
oo (min-entropy) in particular, is of cryptographic relevance 
(e.g., for randomness-extraction), and smooth Renyi entropy 
of order a < 1, and a = (max-entropy) in particular, are 
relevant to data compression (minimum encoding length). In 
these contexts, the importance of smooth Renyi entropy is that 
its rate is basically equal to the Shannon/von Neumann entropy 



rate for an i.i.d. source (and for ergodic sources as well, as we 
show in this paper). This is not the case for conventional Renyi 
entropy. More generally, as shown in the papers by Renner et 
al. mentioned above, smooth Renyi entropy behaves much as 
Shannon/von Neumann entropy does. 

In this paper we focus on the unconditional case, whereas 
much of the abovementioned work by Renner et al. treats the 
more general conditional case. We leave the extension to the 
conditional case for future work. However, we do consider 
two notions of e-closeness, one based on trace distance (also 
known as variational or statistical distance) and one based on 
non-normalized density matrices (or probability distributions), 
where the latter is more suitable to handle the conditional 
caseQ Thus, we believe that our results can be extended to the 
conditional case as well. 

We also note that Renner [4] presents a different kind of 
generalization of i.i.d. quantum sources, namely by analyzing 
the smooth min-entropy of symmetric (permutation-invariant) 
quantum states. Or, more precisely, states in a symmetric 
subspace of 7i® n are considered, for n G N. See [4, Chapter 4] 
for details, which also covers the conditional case. 

II. Preliminaries 

Throughout this paper we use P and Q to denote probability 
distributions with over the same finite or countably infinite 
range Z. Similarly, we use p and a to denote density matrices 
on the same Hilbert space of a finite or countably infinite di- 
mension. These probability distributions and density matrices 
are not necessarily normalized (e.g., ^ z P(z) < 1 if P is 
non-normalized and tr(p) < 1 if p is non-normalized). 

For ease of comparison we state all the preliminaries 
explicitly for the classical case as well as for the quantum 
case. 

Definition 1 (Classical Renyi entropy): The Renyi entropy 
of order a £ [0, oo] of probability distribution P is 

^(P)=^-log^P( Z ) , 

zEZ 



1 



a 



'The trace distance was originally used in [6], [5]. The use of non- 
normalized probability distributions was also shown in the full version of 
[6] and used in [7]. In this paper, we extend this to the use of non-normalized 
density matrices in the quantum case. 



for < a < oo, a ^ 1, and P Q (P) = lim^^ Pg(P) 
otherwise. 

Hence, H (F) = log|{z G Z : F(z) > 0}|, tf^P) = P(P) 
(Shannon entropy) and H^iF) = — logmax ze 2 F(z). 

For a random variable Z we use H a (Z) as a shorthand for 
H a (fz), where P^ is the probability distribution of Z. 

Smooth Renyi entropy was introduced in [6] for the classical 
case. For e > 0, let B e (F) denote either the set of probability 
distributions which are e-close to P, B e (P) = {Q : S(F,Q) < 
e}, or the set of non-normalized probability distributions which 
are e-close to P, B e (F) = {Q : J2 ze z ®i z ) > 1 ~ e , VzezO < 
Q(z) < P(z)}. The first notion of e-closeness, based on the 
statistical distance S(F, Q) = ± J2zez \^(z)-Q(z)\, was used 
in [6]. The second notion was mentioned in the full version 
of [6], and used in [7]. 

Definition 2 (Classical smooth Renyi entropy, [6]): The e- 
smooth Renyi entropy of order a e [0, 1) U (1, oo] of a 
probability distribution P is 



infQ eB e (P ) H a (Q), < a < 1, 
sup QeB . (P) H a (Q), 1 < a < oo. 
At the end of this paper, we point out that H^(F) will 
actually vary, depending on which notion of e-closeness is 
used, leading to a maximum difference of — log(l — e). 

For a probability distribution P on, e.g., Z = {0, 1} N , we 
define P" as the probability distribution corresponding to the 
restriction of the "infinite volume" distribution P to the finite 
volume {0, . . . , n — 1}. 

Definition 3 (Entropy rate of a classical source): For a 
stationary source given by its probability measure P, we 
define 

h(F) = lim -H(F n ), 

n— >oo n 

h%(F) = lim -H e a (F n ). 
We will actually prove that h e a (F) = h(F) as e — > 0. 

We use the standard notion of typical sequences and typ- 
ical sets, which are defined for any information source (not 
necessarily i.i.d.). See, for instance, [2] or [3]. 

Definition 4 (Typical sequences, typical set): A sequence 
z n e {0, 1}", n e N, is called e-typical if 



-n(h(P)+e) 



< F(z n ) < e 



-n(h(P)-e) 



The typical set is the set of all e-typical sequences from 
{0,1}". 

In this paper we need the following consequence of the AEP, 
where we refer to [2, Section 16.8] for the AEP for ergodic 
sources (known as the Shannon-McMillan-Breiman theorem). 

Theorem 1 (Classical AEP bounds): Let P be a stationary, 
ergodic probability distribution on Z = {0, 1} N . Let e > 0. 
Then, for sufficiently large n, 



and 



P(T £ (™)) > 1-e, 
|T e (n) | < e n{h(v)+e) . 



Definition 5 ( Quantum Renyi entropy): The Renyi entropy 
of order a £ [0, oo] of a density matrix p is 

S a (p) = -^—logtr(p a ) 



lim 



s 8 (p) 



I- a 

for < a < oo, a ^ 1, and S a (p) 
otherwise. 

Hence, S (p) = logrank(p), S^p) = S(p) = -tr(plogp) 
(von Neumann entropy) and S oa (p) = —\og\ m ax{p)- 

Analogous to the classical case, smooth Renyi entropy is 
defined in the quantum case (see [5]). We use either the set 
of density matrices which are e-close to p, B e (p) = {a : 
<Kp> c) < £ } or the set of non-normalized density matrices 
which are e-close to p, B e (p) = {a : tr(er) > 1 — e, 0<er< 
p}. The first notion of e-closeness, based on the trace distance 
<5(p, a) = |tr(|p — a\), was used in [5]. The second notion is 
introduced here, and will actually be used in the next section. 

Definition 6 (Quantum smooth Renyi entropy, [5]): The e- 
smooth Renyi entropy of order a G [0, 1)U (1, oo] of a density 
matrix p is 



Slip) 



in£*£B'(p)S a ((r), < a < 1, 
. su P<ref3«( P ) S a (cr), 1 < a < oo. 
Definition 7 (Entropy rates of a quantum source): For a 
stationary quantum source p, given by its local densities 



,(n) 



Po,...,n-i. f° r n S N, we define: 



s(p) = lim -S(pW), 
slip) = lim -S f a (p (n) ). 

n — >oo TL 

We use the following notion of typical states and typical 
subspaces, as can be found in [1] (also see [3]). 

Definition 8 (Typical state, typical subspace): A pure state 
e'™-'}, where e\ n ' is an eigenvector of p( n ) is called e-typical 
if the corresponding eigenvalue satisfies 

e -n(s(p)+e) < < e -„( s ( p )-e)_ 

The typical subspace 7^ is the subspace spanned by all e- 
typical states. 

We will need the following consequences of the quantum 
AEP for ergodic sources, which has been studied in [1] (see 
[3] for the quantum AEP for i.i.d. sources). 

Theorem 2 (Quantum AEP bounds): Let p be a stationary, 
ergodic quantum source with local densities p^ n \ Let e > 0. 
Then, for sufficiently large n, 



tr(p< n >P T( „,) > 1-e, 



where P_(„) is the projector onto the subspace T^ n \ Further- 
more, 

tr(P T (n)) < e" (s(p)+e) . 
Clearly, the quantum AEP for ergodic sources implies the 
classical AEP for ergodic sources. 

The following theorem by Renner and Wolf states that 
smooth Renyi entropy approaches Shannon entropy in the case 
of a classical i.i.d. source. 



Theorem 3 ([7, Lemma 1.2]): Let Z n denote an n-tuple of 
i.i.d. random variables with probability distribution Pz- Then, 

lim lim -Hl(Z n ) = H(Z), 

for any a G [0, oo]. 

The analogous theorem by Renner and Konig for a quantum 
i.i.d. source is as follows. 

Theorem 4 ([5, Lemma 3]): Let p be a density matrix. 
Then, 

lim lim -S 6 Jp® n ) = S(p), 
for any a G [0, oo]. 

III. Main Result 

We extend the results by Renner and Wolf (Theorem [3] 
above) and by Renner and Konig (Theorem [4] above) to the 
case of ergodic sources. Throughout this section, we use 
the notion of e-closeness based on non-normalized proba- 
bility distributions and density matrices, so B e (P) — {Q : 



(z) > 1 - e,V 



< Q(z) < P(z)} and B £ (p) = 



{a : tr(er) > 1 — e, < rr < p\, respectively. In the next 
section, we will argue that the results are independent on which 
notion of e-closeness is used. 

A. Classical Case 

We start with our main result for the classical case. The 
known result for an i.i.d. source is by Renner and Wolf, 
Theorem |3] above. We will extend this to a stationary, ergodic 
source in Theorem [5] below. 

Lemma 1: Let P be a stationary, ergodic information source 
given by its probability measure and let < e < 1/2. Then 
we have, 

h(F) - e < ?4(P) < h(F) + 2e. 
Proof: Let < e < 1/2. To prove the lower bound, we 
show that, for sufficiently large n, H^ xl (P n ) > n(h(P) — e). 
Define non-normalized probability distribution Q for all z n G 
{0, 1}" by 



if z n G % 
if z n </ % 



(n) 
(n) 



Clearly, < Q(z n ) < P(z n ) and, by the AEP, 



(n) 



(Te n> ) > 1 — e for sufficiently large n. So, 



(1) 



B e (P n ). Furthermore, for z" S T € , we have that 

— logP(z") > n(h(P) — e), and hence that for any z n that 
-logQ(z") > n(h(P) - e). This implies that H X (Q) = 

— logmax z n Q(z n ) > n(h(P) — e) and the lower bound 
follows. 

Next, to prove the upper bound, we show that, for suffi- 
ciently large n, one has that for all Q £ B e (P n ), 



logmaxQ(z") < n(h(P) + 2e). 



This follows from max 



'z n ) > e -"(' i ( p )+ 2e ), which 
in turn follows from E z n eT (») Q(z n ) > |T e (n) |e^™(' l ( p ) +2e ). 



From the AEP we get |T e (n) | < e "(' l ( p )+ £ ), hence it suffices 
to prove that, for sufficiently large n, 



E 



(z n ) > e 



(2) 



As E z -Q( z ") > 1 - e, for Q e B £ (l 



and also 

(z n ) < e (because Q(z n ) < P(z n ) and 

P{T} n> ) > 1 — e from the AEP), we only need to observe 
that 

1 - 2e > e~ ne 

holds for sufficiently large n, using that e < 1/2. ■ 
We now state an analogous lemma for the max-entropy. 
Lemma 2: Let P be a stationary, ergodic information source 

given by its probability measure and let < e < 1/2. Then 

we have, 

h(P) - 2e < h e (P) < h(P)+e. 
Proof: Let < e < 1/2. To prove the upper bound, we 
show that, for sufficiently large n, H^{P n ) < n(h(P) + e). We 
do so by showing that H (Q) = log\{z n : Q(z n ) > 0}| < 
n(h (P) + e) for the non-normalized probability distribution Q, 
defined by (Q~|i in the proof of Lemma [T] As 

\{z n : Q(z n ) > 0}| = \{z n £ : Q(z n ) > 0}| < \T^\, 

the result follows directly from the AEP. 

Next, to prove the lower bound, we show that, for suffi- 
ciently large n, one has that for all Q G B e (P n ), 



log \{z n : Q(z n ) > 0}| > n(h(P) - 2e). 



This is implied by \{z n G T c 
which is in turn implied by 



(") 



max 

z « e T e (n) ,Q(z n )>0 



z n ) > 0}| > e n ^- 2e \ 

z nj e n(h(P)-2e)_ 



Using inequality © from the proof of Lemma [TJ it suffices 
to show that 



max 

C") 



z")>0 



(z n ) < e - ne e ~" (/l(p) ~ 2e) = e -™(' l ( p )- £ ). 



This is a direct consequence of the definition of e-typical 
sequences, as Q(z n ) < P(z n ) for Q G B £ (P n ), using that 
Q(z n ) > holds for at least one z n G on account of 
inequality ■ 
Theorem 5: For a G [0, oo], the e-smooth entropy of a 
stationary, ergodic information source P given by its proba- 
bility measure on Z = {0, 1} N is close to the mean Shannon 
entropy: 

limh e JF) = h{P). 

e— >0 

Proof: For a < 1, the monotonicity of smooth Renyi 
entropy (see, e.g., [7, Lemma 1]) yields H^(P n ) < H^(P n ), 
and hence h%(¥) < h(P) + e by Lemma |2] 
To get a lower bound for h e a {P), we note that 

log(l/e) 



H e a (P n ) > H^{P n ) 



1-a ' 



using [7, Lemma 2]. So, h%(¥) > h,Q e (F) as the constant term 
on the right-hand side vanishes for n — > oo. Using Lemma [2] 
we thus get h e a (F) > h(F) - 4e. 

This proves that lim e ^ h e a (F) = h(F). The proof for a > 1 
is completely symmetrical, hence omitted. ■ 

Note that the term 2e in the upper and lower bounds of 
Lemmas [TJ and [2] respectively, can be improved to (1 + S)e 
for any constant 6 > 0. Similarly, the term 4e in the proof of 
Theorem [5] for the lower bound for h e a can be improved to 
(1 + S)e for any constant S > 0. 

B. Quantum Case 

Although it is possible to prove the quantum case directly, 
along the same lines as in the classical case, we treat the 
quantum case indirectly, by reducing it to the classical case. 
This leads to a more compact proof. To this end, we will 
first prove Lemma [3] below, which captures the correspon- 
dence between B € (p^) and B e (X^). We only consider the 
case of e-closeness for non-normalized density matrices and 
probability distributions (but the lemma also holds for the case 
of e-closeness based on trace distance). 

To prove our lemma, we need Weyl's monotonicity principle 
which we recall first. 

Theorem 6 (Weyl monotonicity): If A, B are to by to Her- 
mitian matrices and B is positive, then Ai (A) < Aj (^4 + B) 
for all i — 1, . . . , m, where \(M) is the i-th eigenvalue of 
M (ordered from largest to smallest). 

Lemma 3: Let p be a density matrix with eigenvalues Ai > 
A2 > . . . > A m . 

1) For any density matrix a with eigenvalues p\ > pi > 

• • ■ ^ Pm> 

e B e { P ) => n g B e (X). 

2) Given real numbers fix, ... , p m such that p G B t (X), 
there exists a matrix a with eigenvalues pi, . . . , p m such 
that a G B e (p). 

Proof: We prove the result for 



# e (A) 



{p : ^Pt > 1 - e,VjO < Hi < Ai}, 



B £ {p) = W ■ tr(o-) > 1 - e, < a < p}. 

For the first part, let a be a (possibly non-normalized) density 
matrix with eigenvalues pi > P2 > ■ ■ ■ > Pm and suppose 
a G B e (p). Since a is positive we have pi > for all i. 
And since a < p, we have that p — a is positive as well, 
so A; > pi for all i (using Weyl's monotonicity principle, 
Theorem[6]above). Finally, note that tr(er) > 1— e is equivalent 
to pi > 1 — e, so we conclude that /i G B e (X). 

For the second part, let p G S e (A) be given. We write the 
Hermitian matrix p in diagonal form, 

p = ^2Xi\vi){vi\. 

i 

for eigenvectors i?j (i = 1,...,to), and we show that the 
Hermitian matrix a, defined by 



is in B e (p). 

Since p G £> C (A), we have that < pi < Ai, and because p 
and a commute (eigenvalues of p — a are Ai — pi), we have 
< a < p. Clearly, J^. /Xj > 1 — e so tr(cr) > 1 — e as well, 
and therefore a G B e (p). ■ 

We now proceed to prove the main result for the quantum 
case. 

Theorem 7: For a G [0, 00], the e-smooth entropy of 
a stationary, ergodic quantum source p given by its local 
densities p( n \ for n G N, is close to the mean von Neumann 
entropy: 

iimSaCp) = s (p)- 
Proof: We will apply Theorem \5\ as follows. 

First note that for the local densities p( n ' for a quantum 
information source p, we have that S(p^) = H(X^ ), where 
A 1 -' 1 ' denotes the probability distribution corresponding to the 
eigenvalues of p'™), Consequently, s(p) = h(X) as well, where 
A denotes the probability distribution corresponding to the 
eigenvalues of p. 

Next, we recall the definitions of smooth Renyi entropy in 
the classical and quantum case, resp.: 



s e a ( P ) 



m{ QGBHr) H a (Q), 0<a<l, 

su Pqg^(p) H a(®)> 1 < " < 00. 

inf<reB«(/9) S a (a), < a < 1, 

su Paee-( P ) S a {(j), 1 < a < 00. 



G =22f J 'i\ V i)( V i\' 



We only consider the case a < 1, as the other case follows 
by symmetry. We have that 

S e a (p (n) ) = inf,S» 
inf ff a (/x) 

AieB E (A<™)) 

= ^(A (n) ), 

using that Lemma [3] implies that the infimum over 
is equal to the infimum over 

As a consequence, we have that s^(p) = /Iq(A) and the 
result follows from Theorem [3] Here, we use the fact that 
quantum AEP implies classical AEP. ■ 

We note that the actual convergence rate (as a function 
of e) is the same as in the classical case, which follows by 
considering the analogons of Lemmas [TJ and [2] 

IV. Notions of 6-Closeness 

As mentioned in the introduction, two notions of e-closeness 
were originally introduced by Renner and Wolf [6], [7], which 
can both be used in the definition of classical smooth Renyi 
entropy. For the quantum case, the paper by Renner and Konig 
[5] only considers the notion of e-closeness based on the trace 
distance. As the natural quantum analogon of the notion of 
e-closeness based on non-normalized probability distributions, 
we have used the set of non-normalized density matrices which 
are e-close to a given density matrix p: 

B € (p) = {a : tr(o-) > 1 - e, < a < p}. 



The entropy rates (Definitions [3] and |7), and consequently 
the results for these entropy rates (Theorems [3] IDE] and[7]i do 
not depend on which of these notions of e-closeness is used. 

Furthermore, if the corresponding notions of e-closeness are 
used, the quantum case and the classical case are in general 
connected as follows: 

S' a (p)= W S»= inf ■ H a M=H< a (\), 

creB e (p) /iGB e (A) 

where A denotes the probability distribution corresponding to 
the eigenvalues of p. 

We note, however, that the smooth Renyi entropy may 
depend on which notion of e-closeness is used, contrary to 
what was stated before (see, e.g., Section 3.3 of the full version 
of [6]). In general, one can show that 

OL 

< inf H a (Q)- inf H a (Q) < log(l-e), 

<5(P,Q)<e £ 2 = Ol — 1 

V*0<Q(z)<P(z) 

for < a < 1, and that 

-^-log(l-e) < sup H a (Q)~ sup H a (Q) < 0, 

a 1 <5(P,Q)<e Ej«!(')>i-< 

V;0<Q(z)<P(z) 

for 1 < a < oo. So, only for a = either notion of e- 
closeness yields the same value for the smooth Renyi entropy 
H^. But for all other values of a, the difference may be as 
large as log(l — e). The maximum difference is attained 
for the uniform distribution P(z) = X/m on a finite range Z 
of size m, assuming that e is sufficiently small (i.e., e < 1/m). 
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